AITopics | html element

Collaborating Authors

html element

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

1 Details about the observation formats Figure 1: Example of the observation of WebShop The observation of WebShop is simplified based on the text_rich

Neural Information Processing SystemsFeb-18-2026, 00:22:25 GMT

The observation of WikiHow is represented in exactly the same way with Zhang et al. [2023]. Table 1: Patterns of WebShop pages Pattern Description search The page to search for an item itemlisting The page listing the search results item The information page of a specific item others The item description page, item feature page, and review pageThe similarity lookup table is defined in Table 2. 1 Table 2: Lookup table of the page similarity of WebShop search itemlisting item others search 1 0 0 0 itemlisting 0 1 0 0 item 0 0 1 0.3 others 0 0 0.3 1 2.2 Lookup table of the instruction similarity function of WikiHow Table 3. Table 3: Patterns of WikiHow instructions Pattern Name Pattern Template search Search an article to learn . . . Owing to the limit of budgets, a subset of only 20 tasks is sampled from the full test set. The visualization is available in Figure 2. It can be seen that the performance of R However, there seems to be a saturation for the performance, which may be attributed to the limited number of the active exemplars and training tasks. The saturation of the average reward comes later than that of the success rate. Double Q-Learning [van Hasselt, 2010] is usually leveraged to ameliorate over-estimation for lookup-based Q-Learning.

artificial intelligence, machine learning, webshop, (16 more...)

Neural Information Processing Systems

Country: North America > Canada > British Columbia (0.05)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Not All Visitors are Bilingual: A Measurement Study of the Multilingual Web from an Accessibility Perspective

Bhuiyan, Masudul Hasan Masud, Varvello, Matteo, Zaki, Yasir, Staicu, Cristian-Alexandru

arXiv.org Artificial IntelligenceAug-27-2025

English is the predominant language on the web, powering nearly half of the world's top ten million websites. Support for multilingual content is nevertheless growing, with many websites increasingly combining English with regional or native languages in both visible content and hidden metadata. This multilingualism introduces significant barriers for users with visual impairments, as assistive technologies like screen readers frequently lack robust support for non-Latin scripts and misrender or mispronounce non-English text, compounding accessibility challenges across diverse linguistic contexts. Yet, large-scale studies of this issue have been limited by the lack of comprehensive datasets on multilingual web content. To address this gap, we introduce LangCrUX, the first large-scale dataset of 120,000 popular websites across 12 languages that primarily use non-Latin scripts. Leveraging this dataset, we conduct a systematic analysis of multilingual web accessibility and uncover widespread neglect of accessibility hints. We find that these hints often fail to reflect the language diversity of visible content, reducing the effectiveness of screen readers and limiting web accessibility. We finally propose Kizuki, a language-aware automated accessibility testing extension to account for the limited utility of language-inconsistent accessibility hints.

artificial intelligence, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2508.18328

Country:

Europe (1.00)
North America > United States > New York (0.28)
Asia > Middle East > UAE (0.28)

Genre: Research Report (0.64)

Industry: Health & Medicine > Therapeutic Area (0.34)

Technology:

Information Technology > Communications (1.00)
Information Technology > Human Computer Interaction (0.94)
Information Technology > Artificial Intelligence > Natural Language (0.88)
Information Technology > Artificial Intelligence > Machine Learning (0.68)

Add feedback

OASBuilder: Generating OpenAPI Specifications from Online API Documentation with Large Language Models

Lazar, Koren, Vetzler, Matan, Kate, Kiran, Tsay, Jason, Gupta, David Boaz Himanshu, Shinnar, Avraham, Vallam, Rohith D, Goldbraich, David Amid Esther, Uziel, Guy, Laredo, Jim, Tavor, Ateret Anaby

arXiv.org Artificial IntelligenceJul-9-2025

AI agents and business automation tools interacting with external web services require standardized, machine-readable information about their APIs in the form of API specifications. However, the information about APIs available online is often presented as unstructured, free-form HTML documentation, requiring external users to spend significant time manually converting it into a structured format. To address this, we introduce OASBuilder, a novel framework that transforms long and diverse API documentation pages into consistent, machine-readable API specifications. This is achieved through a carefully crafted pipeline that integrates large language models and rule-based algorithms which are guided by domain knowledge of the structure of documentation webpages. Our experiments demonstrate that OASBuilder generalizes well across hundreds of APIs, and produces valid OpenAPI specifications that encapsulate most of the information from the original documentation. OASBuilder has been successfully implemented in an enterprise environment, saving thousands of hours of manual effort and making hundreds of complex enterprise APIs accessible as tools for LLMs.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2507.05316

Country: Europe (0.28)

Genre: Research Report (0.82)

Industry: Information Technology (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

The Synergy of Automated Pipelines with Prompt Engineering and Generative AI in Web Crawling

Huang, Chau-Jian

arXiv.org Artificial IntelligenceDec-29-2024

Web crawling is a critical technique for extracting online data, yet it poses challenges due to webpage diversity and anti-scraping mechanisms. This study investigates the integration of generative AI tools Claude AI (Sonnet 3.5) and ChatGPT4.0 with prompt engineering to automate web scraping. Using two prompts, PROMPT I (general inference, tested on Yahoo News) and PROMPT II (element-specific, tested on Coupons.com), we evaluate the code quality and performance of AI-generated scripts. Claude AI consistently outperformed ChatGPT-4.0 in script quality and adaptability, as confirmed by predefined evaluation metrics, including functionality, readability, modularity, and robustness. Performance data were collected through manual testing and structured scoring by three evaluators. Visualizations further illustrate Claude AI's superiority. Anti-scraping solutions, including undetected_chromedriver, Selenium, and fake_useragent, were incorporated to enhance performance. This paper demonstrates how generative AI combined with prompt engineering can simplify and improve web scraping workflows.

chatgpt, claude ai, prompt ii, (11 more...)

arXiv.org Artificial Intelligence

2502.15691

Genre: Research Report > New Finding (0.48)

Industry: Information Technology > Services (0.47)

Technology:

Information Technology > Data Science > Data Mining > Web Mining (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.94)

Add feedback

Steward: Natural Language Web Automation

Tang, Brian, Shin, Kang G.

arXiv.org Artificial IntelligenceSep-23-2024

Recently, large language models (LLMs) have demonstrated exceptional capabilities in serving as the foundation for AI assistants. One emerging application of LLMs, navigating through websites and interacting with UI elements across various web pages, remains somewhat underexplored. We introduce Steward, a novel LLM-powered web automation tool designed to serve as a cost-effective, scalable, end-to-end solution for automating web interactions. Traditional browser automation frameworks like Selenium, Puppeteer, and Playwright are not scalable for extensive web interaction tasks, such as studying recommendation algorithms on platforms like YouTube and Twitter. These frameworks require manual coding of interactions, limiting their utility in large-scale or dynamic contexts. Steward addresses these limitations by integrating LLM capabilities with browser automation, allowing for natural language-driven interaction with websites. Steward operates by receiving natural language instructions and reactively planning and executing a sequence of actions on websites, looping until completion, making it a practical tool for developers and researchers to use. It achieves high efficiency, completing actions in 8.52 to 10.14 seconds at a cost of $0.028 per action or an average of $0.18 per task, which is further reduced to 4.8 seconds and $0.022 through a caching mechanism. It runs tasks on real websites with a 40% completion success rate. We discuss various design and implementation challenges, including state representation, action sequence selection, system responsiveness, detecting task completion, and caching implementation.

language model, steward, website, (15 more...)

arXiv.org Artificial Intelligence

2409.15441

Country:

North America > United States > Michigan > Washtenaw County > Ann Arbor (0.14)
North America > United States > New York (0.04)
North America > United States > Illinois (0.04)
(9 more...)

Genre: Research Report (0.81)

Industry:

Consumer Products & Services > Travel (1.00)
Information Technology > Security & Privacy (0.93)
Transportation > Passenger (0.67)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Add feedback

Dual-View Visual Contextualization for Web Navigation

Kil, Jihyung, Song, Chan Hee, Zheng, Boyuan, Deng, Xiang, Su, Yu, Chao, Wei-Lun

arXiv.org Artificial IntelligenceFeb-6-2024

Automatic web navigation aims to build a web agent that can follow language instructions to execute complex and diverse tasks on real-world websites. Existing work primarily takes HTML documents as input, which define the contents and action spaces (i.e., actionable elements and operations) of webpages. Nevertheless, HTML documents may not provide a clear task-related context for each element, making it hard to select the right (sequence of) actions. In this paper, we propose to contextualize HTML elements through their "dual views" in webpage screenshots: each HTML element has its corresponding bounding box and visual content in the screenshot. We build upon the insight -- web developers tend to arrange task-related elements nearby on webpages to enhance user experiences -- and propose to contextualize each element with its neighbor elements, using both textual and visual features. The resulting representations of HTML elements are more informative for the agent to take action. We validate our method on the recently released Mind2Web dataset, which features diverse navigation domains and tasks on real-world websites. Our method consistently outperforms the baseline in all the scenarios, including cross-task, cross-website, and cross-domain ones.

html document, neighbor, ual -vcr vnei, (14 more...)

arXiv.org Artificial Intelligence

2402.04476

Country:

North America > Canada > Ontario > Toronto (0.05)
North America > United States > New York (0.05)
North America > United States > Ohio (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Communications > Web (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Raze to the Ground: Query-Efficient Adversarial HTML Attacks on Machine-Learning Phishing Webpage Detectors

Montaruli, Biagio, Demetrio, Luca, Pintor, Maura, Compagna, Luca, Balzarotti, Davide, Biggio, Battista

arXiv.org Artificial IntelligenceOct-13-2023

Machine-learning phishing webpage detectors (ML-PWD) have been shown to suffer from adversarial manipulations of the HTML code of the input webpage. Nevertheless, the attacks recently proposed have demonstrated limited effectiveness due to their lack of optimizing the usage of the adopted manipulations, and they focus solely on specific elements of the HTML code. In this work, we overcome these limitations by first designing a novel set of fine-grained manipulations which allow to modify the HTML code of the input phishing webpage without compromising its maliciousness and visual appearance, i.e., the manipulations are functionality- and rendering-preserving by design. We then select which manipulations should be applied to bypass the target detector by a query-efficient black-box optimization algorithm. Our experiments show that our attacks are able to raze to the ground the performance of current state-of-the-art ML-PWD using just 30 queries, thus overcoming the weaker attacks developed in previous work, and enabling a much fairer robustness evaluation of ML-PWD.

html element, manipulation, webpage, (15 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3605764.3623920

2310.03166

Country:

Europe > Denmark > Capital Region > Copenhagen (0.05)
North America > United States > New York > New York County > New York City (0.05)
Europe > Italy > Sardinia > Cagliari (0.04)
(8 more...)

Genre: Research Report (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

GitHub - kagermanov27/daath-ai-parser: Daath AI Parser is an open-source application that uses OpenAI to parse visible text of HTML elements.

#artificialintelligenceDec-10-2022, 09:40:30 GMT

Daath AI Parser is an open-source application that uses OpenAI to parse visible text of HTML elements. It is built on top of FastAPI. It is ready to set up as a server, and make calls from any language. It is currently hosted in ai.kagermanov.com. You need to register a free account first.

large language model, machine learning, natural language, (19 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.71)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.71)

Add feedback

Create an Image Classification Tool With ml5.js and HTML

#artificialintelligenceNov-6-2022, 21:31:08 GMT

Machine learning is a fundamental technology in the modern world. Computers can learn to recognize images, create artwork, and even write their own code, all with minimal human intervention. But how does machine learning work and how can you use it yourself? Machine learning is a relatively simple concept. Computer systems can learn and adapt by analyzing existing data patterns from pools of information.

Add feedback

ConditionalQA: A Complex Reading Comprehension Dataset with Conditional Answers

Sun, Haitian, Cohen, William W., Salakhutdinov, Ruslan

arXiv.org Artificial IntelligenceOct-13-2021

We describe a Question Answering (QA) dataset that contains complex questions with conditional answers, i.e. the answers are only applicable when certain conditions apply. We call this dataset ConditionalQA. In addition to conditional answers, the dataset also features: (1) long context documents with information that is related in logically complex ways; (2) multi-hop questions that require compositional logical reasoning; (3) a combination of extractive questions, yes/no questions, questions with multiple answers, and not-answerable questions; (4) questions asked without knowing the answers. We show that ConditionalQA is challenging for many of the existing QA models, especially in selecting answer conditions. We believe that this dataset will motivate further research in answering complex questions over long documents. Data and leaderboard are publicly available at \url{https://github.com/haitian-sun/ConditionalQA}.

artificial intelligence, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2110.06884

Country:

Europe > United Kingdom (0.28)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
(3 more...)

Genre: Research Report (0.50)

Industry:

Health & Medicine > Therapeutic Area (0.47)
Government > Regional Government (0.46)
Education > Assessment & Standards > Student Performance (0.42)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback